Skip to content

Add robust eigh_v2 problem#163

Draft
msaroufim wants to merge 1 commit into
mainfrom
qr-v2-conditioning-hardening
Draft

Add robust eigh_v2 problem#163
msaroufim wants to merge 1 commit into
mainfrom
qr-v2-conditioning-hardening

Conversation

@msaroufim

@msaroufim msaroufim commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

Validation

  • python3 -m py_compile problems/linalg/eigh_v2/eval.py problems/linalg/eigh_v2/reference.py problems/linalg/eigh_v2/task.py problems/linalg/eigh_v2/submissions/torch_eigh.py problems/linalg/eigh_v2/submissions/triton_diagonal_fast_path.py
  • /Users/mark/Dev/kernelbot/.venv/bin/ruff check problems/linalg/eigh_v2
  • git diff --check
  • Local-only KernelBot debug setup against kernelbot_eigh_v2_debug on 127.0.0.1, with the local checkout registered through PROBLEM_DEV_DIR / PROBLEMS_REPO.
  • Baseline torch_eigh.py local submissions on B200:
    • test: 41/41 pass
    • benchmark: 10/10 pass
    • real leaderboard submission after repeat-budget fix: pass, about 116s end-to-end locally; recorded phase durations included test at 7-10s, benchmark at 34-44s, and leaderboard at 44-48s.
  • Adversarial local submissions:
    • Tensor subclass/output deferral failed in the evaluator with Q must be a plain torch.Tensor.
    • Cache/replay and harness timing patch attempts were rejected by KernelGuard on the normal local API.
    • With KernelGuard disabled only on a separate local debug API, cache/replay still failed evaluator correctness, and forged CUDA-event timing failed the new physical roofline floor.

Provenance

Resolved problem directory: problems/linalg/eigh_v2. Ranked/profile shapes come from eigh_v2/task.yml benchmarks:. Profile mode wraps the submitted kernel in the upstream custom_kernel NVTX region. Reference-kernels base used for this PR: origin/main at 4a1153e, with this branch at 9bcefc4.

@msaroufim msaroufim force-pushed the qr-v2-conditioning-hardening branch from 40ca746 to b208d15 Compare June 30, 2026 22:31
Add a separate eigh_v2 leaderboard that keeps the existing eigh problem untouched while carrying the stricter checker and benchmark-integrity hardening from the open eigh follow-ups.

The v2 evaluator regenerates inputs for scored benchmark iterations, rejects physically impossible reported times, and keeps profile mode from the current upstream evaluator. The v2 checker requires plain tensor outputs and adds an explicit eigenvalue comparison against torch.linalg.eigvalsh(A).

The ranked set is trimmed to ten cases and repeats the central 512x512 shape across dense, mixed, rank-deficient, clustered, and row-scaled distributions so shape-only precision routing is less useful than inspecting matrix quality.

Credit: this consolidates ideas and fixes from #156, #159, #160, and #161.

Co-Authored-By: Bryce Adelstein Lelbach <brycelelbach@gmail.com>
@msaroufim msaroufim force-pushed the qr-v2-conditioning-hardening branch from 133bded to 9bcefc4 Compare July 5, 2026 03:34
@msaroufim msaroufim changed the title Document QR v2 conditioning hardening Add robust eigh_v2 problem Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant